On Bayesian Nonparametric Continuous 
Time Series Models 



George Karabatsos^ 

University of Illinois-Chicago 

and 

Stephen G. WalkeJl 

University of Kent, United Kingdom 

March 2, 2013 



Abstract: This paper is a note on the use of Bayesian nonparametric mixture models for 
continuous time series. We identify a key requirement for such models, and then establish 
that there is a single type of model which meets this requirement. As it turns out, the model 
is well known in multiple change-point problems. 
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1 Introduction 



A lot of recent research has focused on the development of Bayesian nonparametric, countably- 
infinite mixture models for time series data. This work has aimed to relax the normality 
assumptions of the general class of dynamic linear models (West & Harrison, 1997), which 
already encompasses traditional normal (time-static) linear regression, autoregressions, au- 
toregressive moving average (ARMA) models, and nonstationary polynomial trend and time- 
series models. 

These Bayesian nonparametric (infinite-mixture) time series models have the general 
form: 

/oo 
f(y t \x t ,7,e)dG t (0) = £ f{yt\x*,1,0tj)Uj{t)i 
i=i 

given time t £ T; kernel (component) densities {/(-|x 4 , 7, 0tj)}jLi which are often specified 
by normal densities of a dynamic linear model; mixture distribution 

G*(-) = E^iWiW^(t}(0, 

which is formed by an infinite-mixture of point-mass distributions 5e.(t)(-) with mixture 
weights ojj{t)] and prior distributions 7 ~ tt(7), Otj ~ Cot, {uj(t)}°Z 1 ~ II. All of the 
earlier models focuses on discrete time, and specify G t to be some variant of the Dependent 
Dirichlet process (DDP) (MacEachern, 1999, 2000, 2001), so that the mixture weights have 
a stick-breaking form, with 

LOjit) = Vj (t) n£i(l - vi(t)), and Vj (t) : T -)> [0, 1]. 

(Sethuraman, 1994). Such DDP-based time-series models either assume time-dependent 
stick-breaking weights (Griffin & Steel, 2006, 2011; Rodriguez & Dunson, 2011), or assume 
non-time-dependent stick-breaking weights and a time- dependent prior (baseline) distribu- 
tion Got (Rodriguez & ter Horst, 2008), or assume a fully non-time-dependent Dirichlet 
process (DP) Gt = G with only time- dependence in the kernel densities (Hatjispyros, et al., 
2009; Tang & Ghosal, 2007; Lau & So, 2008; Caron et al, 2008; Giardina et al, 2011; Di 
Lucca et al., 2012). Other related approaches construct a time-dependent DDP Gt either 
by generalizing the Polya urn scheme of the DP (e.g., Zhu et al., 2005; Caron et al., 2007); 
by a convex combination of hierarchical Dirichlet processes (HDP) or DPs (Ren et al., 2008; 
Dunson, 2006); by a HDP-based hidden Markov model that has infinitely-many states (Fox 
et al., 2008, 2011); or by a Markov-switching model having finitely-many states (Taddy &: 
Kottas, 2009). 

The more recent work on Bayesian nonparametric time series modeling has focused on 
continuous time, and on developing a time-dependent mixture distribution that has the 
general form, 

00 

Gt = X>(*)M-), (1) 

based on a process other than the DDP. Above, a baseline prior 0j r^ iid Gq is assumed, which 
is a standard assumption. 
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In Section 2 we describe these continuous time series models, namely, the geometric 
model (Mena, Ruggiero, & Walker, 2011), and a normalized random measure model (NRM) 
(Griffin, 2011). In Section 3, we highlight a key property such models are required to possess, 
and we identify a necessary model which has such a property. We also in this section prove 
that the existing continuous time series models do not have the required property. 



2 Continuous time models 

The geometric model constructs a dependent process G t using time-dependent geometric 
mixture weights 

u j {t) = \ t (i-x t y- 1 , (2) 

with X t specified as a two-type Wright-Fisher diffusion (Mena, Ruggiero, & Walker, 2011). 

The (A)t follow a stochastic process with the stationary density being a beta(a, 6). The 
transition mechanism is given, for t > s, by 



p{X t \X s ) = y^Pfe(m) p(X t \m,X s 



m=0 

where h = t — s and 



p(X t \m, X s ) = beta(A(|a + k, b + m — k) bin(/c|m, X s ) 

k=0 

and 

/ s (a + b) m exp(-mch) , - c h\ a + b 

Ph(m) = i '- p '- 1 - e 

ml 

for some c > 0. 

Hence G t is a continuous time process and the properties are studied in Mena, Ruggiero, 
and Walker (2011). 

The normalized random measures (NRM) model constructs a time-dependent process G t 
using time-dependent mixture weights that are formed by normalizing a stochastic process 
derived from non-Gaussian Ornstein-Uhlenbeck processes (Griffin, 2011). Specifically, these 
weights are constructed by 

, , m l(rj < t)exp(-A(t-r J ))J i 
A ) E"i1(ti <f)«q>(-A(t-T,))J,' [6) 

where (r, J) follows a Poisson process with intensity Xw(J), where w is a Levy density. 

Details and examples of obtaining the (tj, Jj) are provided by Griffin (2011). Aside from 
the specific examples considered in this paper, we also note that any sequence of (r^, Jj) are 
permissible provided 

YlZi 1 ( T i ^ f ) exp(-A(t - ti))Ji < oo 

for all t. 
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3 A key property 

Using the mixture model 

/oo 
K{y\9) G t (d6) = ^j(t)K(y\6 J ), 
i=i 

we insist on the obvious requirement that for all suitably small h, we want yt and yt+h to be 
arising from the same component. This requirement is clearly not met by simply insisting 
that G t +h — > Gt as h — > 0. 

So, in this paper, we introduce the argument that a Bayesian nonparametric continuous 
time series model should have a certain property. Specifically, based on the above discussion, 
we need the property that 

p{e t = e t+h )^\ as h^o, 

where 9 t denotes a sample from 

oo 

G t = ^2uj(t) S dj , 

i.e. that 9 t \G t ~ G t , which means that P{9 t = 9j) = Wj(t). 
Now it can be shown that 

oo 

p(o t = e t+h ) = J2 p (°t = 9 t +h = Qj), 

and hence we are asking for 

E u)j(t + ->■ 1 as 

For this, it is necessary that 

oo 

D{h) = ujj(t + h) — > 1 in probability as h — > 0. 



Now assume that 



sup + /i) — a.s. as h — > 



which is an extremely mild condition. 
Hence, for any e > 0, 

sup \uij(t + h) — ujj(t)\ < e 
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for all small enough h. Therefore, for all small h, we have 

oo 

D(h) <^Tu%(t) + e a.s. 

The only way we can now recover the convergence to 1 in probability is that 

u}j(t) = 1 a.s. 

for a particular j, which will depend on t. 

Hence, we believe that a Bayesian nonparametric continuous time series model should 
specify a time-dependent mixture distribution G t of the type given in ([T]), where 

Uj(t) = l(t e Aj), 

and the (Aj)j form a random partition of (0, oo). In other words, we recommend Bayesian 
nonparametric change-point mode for time series analysis. Specifically, let T> = {(2/tJ}" =1 
denote a sample of data consisting of n dependent responses y ti observed at time points U. 
Then, such a model may be specified as: 

Vu ~ f(yti\0z(ti)), i = l,...,n, (4a) 

Z{U) = j <S=^ Tj-i <U< Tj = {Tj-x + €j) (4b) 

ej ~ Ex(A), j = 1,2,..., (4c) 
G j ~ G , j = l,2,..., (4d) 

where z(ti) denotes the random component index, and each of the gaps €j = Tj — Tj_i 
are i.i.d. from an exponential Ex(A) prior distribution, with tq := 0. The exponential 
distribution for creating the intervals is not essential but there seems little reason to make 
it more complicated. 

Interestingly, neither the geometric model nor the NRM model specify a mixing distribu- 
tion G t with weights that satisfy the key property, previously described. Figure [1] illustrates 
this fact. Specifically, for the geometric model, the figure shows samples of the random com- 
ponent index z(t) ~ Pr(z(t) = j) oc Uj(t), over a convergent sequence of times t = tz_i + l/Z 2 , 
for / = 1,2, ... , 1000, with t = 0. These samples are presented for different choices of prior 
parameters in this model, namely b = 1,10,30,50, along with a = c = 1. As the figure 
shows, as t converges to time 1.6439, the random variable z(t) does not converge to a single 
value. Instead, the random variable displays a degree of uncertainty about the component 
(kernel) density at that time. 

Now, we formally show how our time series model satisfies the property, whereas the 
geometric model and the NRM models do not. For our model for which we have based on 
the 

Wj (t) = l(t E Aj) 

and 

A i = ( r i-i' r i) 



5 



with 

t j = T 3-1 + e i 

where the (e,-) are independent and identically distributed exponential random variables with 
parameter A, it is straightforward to show that 

E(+\ (+ u\ — / ^ w ith probability e~ h 
Wj{t) w 3 {t + h)-< Q w . th probabiUty 1 _ e - A 

3=1 v 

This follows since we need t,t + h £ Aj. Hence, it it seen that 
For the geometric model, we have 



given by 



.3=1 



E i Yl X ^ - A *) i_1 wi - A *+») i_1 

.3=1 



which is 



E 



At + A t+ h — \ t \t+h 

This is strictly less than one due to the fact that A t and \ t +h are less than 1. 
Finally, the NRM model also has 

E + j < 1, 

and this result follows from the proof of his Theorem 2, which appears in the Appendix of 
his paper. 



4 Discussion 

In summary, we advocate a specific property for mixture models for continuous time series. 
Namely, that as the time t + h approaches the limit h — > 0, the model should certainly 
identify a single component index z(t), and hence a single component density f{yt\O z (t)) of 
the dependent response Y t . In other words, there is no strong reason why one should specify 
a time-series model that allows the component density to drastically change, as time goes 
through incrementally smaller changes. In essence we are not asking for G t to be close to 
G t+ h, though this is given, but a rather weak condition; rather we are asking that 9 t and 
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9 t +h are close in probability, which approaches 1 as h — > 0. 

Interestingly, we have shown that two Bayesian nonparametric (infinite-mixture) models 
fail this sensible property. In contrast, we have shown that for a mixture model to satisfy the 
property, it must be of the form given in equation (j3J). This implies that the mixture model 
must be a Bayesian multiple change-point model (e.g., Barry & Hartigan, 1993; Chib, 1998), 
having infinitely-many change-point parameters Tj-_i < Tj, j = 1,2, . . .. Then, these results 
may encourage future developments in Bayesian nonparametric models for continuous time 
series, more in terms of multiple change point modeling. 
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Figure 1: For the geometric time series model, the log of samples of component index 
z(t) ~ Pr(z(t) = j) oc uij(t), over a convergent sequence of times t = + 1/Z 2 , for 
I = 1, 2, . . . , 1000, with to = 0. The component index samples are shown for a range of 
choices of prior parameters, b — 1, 10, 30, 50, along with a = c = 1. 
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